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ABSTRACT 



The analysis of interaction effects in multiple regression 
has received considerable attention in recent years, but problems with the 
valid identification of moderating variables have been noted by researchers. 
G. McClelland and C. Judd (1993), in their discussion of the statistical 
difficulties of detecting interactions and moderating effects, warned against 
the use of a four- corners subsample approach to moderated multiple 
regression, but they did not present empirical evidence that such an approach 
provides less power than the use of the full random sample. This study was 
conducted to produce evidence of the extent of power loss that is associated 
with the subsample strategy. The effectiveness of the four- corners subsample 
procedure was investigated through a Monte Carlo study that used regression 
models to generate data from populations with linear, nonlinear, and 
nonadditive relationships. In all, 2,304 conditions were examined, for 3 
models, 4 levels of population "R" squared, 4 levels of regressor 
correlation, 4 levels of regressor reliability, 3 levels of sample size, and 
4 levels of effect size for the nonlinear or nonadditive component. Results 
suggest that the use of the four-corners strategy rather than full sample 
analysis shows better specificity at the expense of reduced statistical 
power, or sensitivity, relative to full sample analysis. Despite the improved 
specificity of the four-corners approach, model misidentif ication rates were 
high in many of the conditions examined. The utility of either the 
four-corners approach or the full sample approach for testing theory is 
limited. (Contains 3 tables and 26 references.) (SLD) 
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The McCleland and Judd Approach: 

Using “Four-corners” Data to Detect Nonlinearity and Nonadditivity 



As Manly (1992) no ted, ..’’in a multiple regression analysis, a single variable y is related 
to two or more variables to see how Y is related to the X’s” (1992). Problems can arise, 
however, while interpreting results of data collected from observations that are not like those 
from the population of interest (i.e., an atypical sample); or by failing to correctly specify the 
functional form of the relationship between the predictors and the criterion variable. Two 
examples of the latter are nonlinearity and nonadditivity. Nonlinearity in the model occurs when 
the regression of y on at least one X variable depends upon the value of that variable (either 
accelerating or decelerating. Cortina (1993) relates “... the possibility of nonlinear relationships 
continues to go relatively unexplored. For this reason, interpretation of significant interaction 
terms in multiple regression may be difficult ...” Budescu (1980) reported, “as the degree of 
collinearity increases, the results of the analysis become more and more a function of the internal 
relations between the predictors...”. Saunders (1955) was first to devise a method to test 
interactions (moderator effects) and called his invention “moderated multiple regression.” 
Customarily, it is necessary to test for an interaction (moderator effect) when the effect of one 
variable, X, on a second variable, y, seems to depend on the level of a third variable, z. The 
problem of nonadditivity in the model refers to a product term consisting of two predictors 
multiplied together; creating a joint effect of these independent variables on the dependent 
variable. Nonlinearity and nonadditivity are referred to as specification errors when there is a 
lack of proper congruence between the sample regression model and the population. There exist 
both true (correctly specified) and misspecified models. “The rub, however, is that the true model 
is seldom, if ever , known” ( Pedhazur,1982). 
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Detecting Nonlinearity and Nonadditivitv 

The analysis of interaction effects in multiple regression has received considerable 
attention in recent years (e.g., Aiken & West, 1991; Jaccard & Wan, 1995; McClelland & Judd, 
1993), although the methods for such analyses have been known for at least 40 years (Saunders, 
1955). An interaction effect indicates that the relation between a criterion variable (Y) and a 
predictor variable (X) varies as a function of some third variable (Z). This third variable is 
commonly referred to as a moderator (Saunders, 1955). Moderator variables are common in 
behavioral research (Baron & Kenny, 1986). For example, Perlin, Menagham, Lieberman, and 
Mullen (1981) hypothesized moderating effects for both coping responses and social support, on 
the relationship between stressful events and health. Similarly, Findley and Cooper (1983) 
hypothesized that the relationship between locus of control and academic achievement is 
moderated by demographic factors such as gender, race, and socio-economic status. 

A statistical test for interaction (or moderator) effects is usually accomplished with 
hierarchical multiple regression, in which differences in sample R 2 values between an additive 
model and a non- additive model are tested (equivalently, for a single moderator component, the 
test of the regression weight for the product term may be used). Although alternative testing 
procedures have been recommended in the literature, such procedures subsequently have been 
shown to be incorrect (e.g., Cronbach, 1987; Dunlap & Kemery, 1987). As McClelland and Judd 
(1993) asserted, there has been "no credible published refutation of the appropriateness of 
[hierarchical multiple regression] as a test of moderator effects" (p. 377). 

Problems with the valid identification of moderating variables have been noted by both 
research methodologists and applied researchers. Some of the more frequently encountered 
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difficulties in statistically detecting such effects in non-experimental research have been 
attributed to measurement error (Dunlap & Kemery, 1988; Jaccard & Wan, 1995), 
multicollinearity (Morris, Sherman & Mansfield, 1986), low residual variance of the product 
term in the regression equation (McClelland & Judd, 1993), residual variance heterogeneity 
(Alexander & DeShon, 1994), and even a natural consequence of multivariate normality (Fisicaro 
& Tisak, 1994). 

McClelland and Judd (1993) noted the relative ease with which interaction effects are 
apparently detected in experimental research, in contrast with the difficulties of their detection in 
field studies. These authors attributed the power deficits seen in field research to a lack of 
residual variance in the product term used in moderated multiple regression, an effect attributable 
to the use of nonoptimal distributions of regressor variables in field research. That is, 
experimental research is characterized by observations occurring at extreme values of the 
regressor variables, while field research is characterized by observations occurring at more 
moderate values. 

McClelland and Judd (1993) clearly warned against naive applications in field research of 
their ideas. For example, artificially dichotomizing regressor variables does not make the 
observations on those variables truely extreme. Further, Maxwell and Delaney (1993) 
demonstrated that such dichotomization can easily distort the relationships between variables. A 
second "unwise strategy" noted by McClelland and Judd is the collection of a random sample of 
data in field research from which an approximately optimal subsample is obtained. This 
subsample of data, from the four-comers of the bivariate distribution, is then analyzed using 
moderated multiple regression. Although intuitively appealing to some extent, the use of such a 
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subsample is likely to lead to even less statistical power (because of the smaller sample size) than 
that obtained from the random sample itself (despite the nonoptimal distribution in the random 
sample). 

Although McClelland and Judd (1993) warned against the use of a 4-comers subsample 
approach to moderated multiple regression, they did not present empirical evidence that such an 
approach provides less power than the use of the full random sample. The present study was 
designed to produce evidence of the extent of power loss that is associated with the subsample 
strategy. 

A Variety of Potential Regression Models 

Consider a multiple regression equation, with two regressors X and Z. If the linear regression 
model, y = a + p,X + P 2 Z + e, is fit to a set of data, and inspection of (for example) partial 
regression plots indicates departure from linearity, the researcher is not certain which of the 
following models may accurately describe the relationship between the regressors and the 
dependent variable: 

a) y = a + P,X + P 2 Z + P 3 XZ + e (moderation) 

b) y = a + P ,X + p 2 Z + P 3 X 2 + e (nonlinearity in X) 

c) y = a + P,X + P 2 Z + P 3 Z 2 + e (nonlinearity in Z) 

d) y = a + P,X + P 2 Z + P 3 X 2 + P 4 Z 2 + 6 (nonlinearity in X and Z) 

e) y = a + P,X + P 2 Z + P 3 X 2 + P 4 Z 2 + P 5 XZ + 6 (nonlinearity and nonadditivity) 

If a researcher lacks a theoretical reason for expecting a particular functional form of the 
relationship, inspection of each model may be made to determine whether the nonlinear model or 
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the moderation model better describes the relationship between the regressors and the dependent 
variables. The underlying relationship may be best represented by a moderated equation (model 
a), suggesting that the relationship between the outcome variable and each of the regressors 
depends on the value of the other regressor. In contrast, the underlying relations may be best 
represented by a nonlinear relationship between one of the regressors and the criterion variable 
(models b and c), by nonlinear relationships between both regressors and the criterion (model d), 
or by a combination of nonlinearity and moderation (model e). 

Selection of the wrong model based upon sample data may be considered a Type I error, 
a Type II error, or a lack of specificity of the test used. For example, if the population from which 
the sample was drawn is accurately characterized by a linear, additive model, the selection of any 
nonlinear or nonadditive model (a through e) represents a Type I error. Conversely, a Type II 
error may result if the population is best characterized by a nonlinear or nonadditive model, but 
none of the nonlinear or nonadditive models provide a sufficient increase in R 2 relative to the 
additive model. Finally, if the population is best characterized by a nonlinear model, but the 
researcher selects a moderated model based upon the sample data, a lack of specificity is evident. 
A test with good specificity will lead to rejecting the null hypothesis associated with the actual 
population model, but not rejecting null hypotheses associated with other models. 

A number of factors are related to the lack of specificity in moderated multiple 
regression, but probably the greatest contributor to such errors is the presence of measurement 
error in the instruments used to represent the phenomenon being investigated. The importance of 
measurement error in selecting the best-fitting model from competing models has been discussed 
previously in great detail (cf., Busemeyer & Jones, 1983; MacCallum & Mar, 1995), particularly 
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for those models that incorporate multiplicative composite terms. Essentially, the reliability of 
the product terms is a joint function of the reliabilities of the components (X and Z) and the 
correlation between the components. 

If X and Z are not correlated, then the reliability of the product term (XZ) is equal to the 
product of the separate reliabilties of X and Z. Thus, if the separate reliabilities of the component 
terms are relatively high, then the reliability of the product term will be high. Conversely, low 
component reliabilities will result in low composite reliability scores, and an increased 
probability of committing a Type II error by not detecting an effect when one is present. As the 
reliability of each component increases, the reliability of the composite increases and the Type II 
error probability decreases. 

This phenomena is also true of nonlinear effects. As has been pointed out by Shepperd 
(1991), the quadratic composite term in a regression model would also suffer from unreliability if 
the component terms were unreliable, the reliability of the composite term X 2 being equal to the 
square of the reliability of X. The effects of such unreliability on the quadratic term are the same 
as the effects on the cross-product term noted above — a reduction in statistical power and a 
concomitant increase in the probability of a Type II error. 

The reliability problem is compounded when the regressor variables are correlated with 
each other. As has been noted many times, as the correlation between X and Z increases, the 
quadratic term (X 2 ) and the interaction term (XZ) will share substantial variance and will become 
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Method 

The effectiveness of the 4-comers subsample procedure was investigated through a 
Monte Carlo study which used regression models to generate data from populations evidencing 
(a) linear, (b) non-linear, and (c) non-additive relationships. In addition to the functional form of 
relationship characteristic of the population, five factors were manipulated in the study: (a) the 
correlation between the regressor variables, (b) the overall population R 2 of the additive 
component of the regression model, (c) the effect size of the non-linear or interaction terms (X 2 , 
Z 2 , and XZ), (d) the reliabilities of the regressors, and (e) sample size. Only models with two 
regressor variables were included in the study. 

The magnitude of the correlation between the two regressors was controlled at levels 
ranging from .00 to .90. Four levels of R 2 of the additive component of the population models 
were examined: .02, .13, .26, and .50. The first three levels represent small, medium, and large 
effect sizes for the population R 2 , corresponding to f 2 values of .02, .15, and .35 (Cohen, 1988). 
The population R 2 value of .50 was included based upon the review of correlational studies 
conducted by Jaccard and Wan (1995). In this review, the 75th percentile of the distribution of 
sample R2s found in the psychological literature was .50. The magnitudes of the interaction 
component or the non-linearity component were controlled at four levels, representing small, 
medium, and large effect sizes (Cohen, 1988), as well as a null condition. 

Measurement error was simulated in the data (following the procedure used by Maxwell, 
Delaney & Dill, 1984; and by Jaccard & Wan, 1995) by generating four normally distributed 
random variables for each observation (two to represent "true scores" on the regressors, and two 
to represent errors of measurement). Fallible, observed scores on the regressors were calculated 
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(under "classical" measurement theory) as the sum of the true and error variables. The 
reliabilities of the regressors were controlled by adjusting the error variances relative to the true 
score variances. Reliabilities were examined ranging from .40 to 1.00. 

Sample sizes of 60, 175 and 400 were used. The larger two of these values represent the 
median and 75th percentile of sample sizes found in Jaccard and Wan's (1995) review of 
correlational studies in psychology. The small sample size (n = 60) was included to extend the 
results to small sample analyses. Five thousand samples of each size were generated for each 
condition in the Monte Carlo study. The use of five thousand replications provide maximum 95% 
confidence intervals of ± .014 around the observed proportion of null hypotheses rejected. 

For each sample, the entire sample was analyzed using moderated multiple regression, 
then a 4-comers subsample was extracted. The subsample was selected by retaining only the 
most extreme 1 0% of the observations from each comer of the sample bivariate distribution. The 
4-comers subsample was then analyzed using moderated multiple regression. The moderated 
multiple regression strategy involved fitting four models to each sample (and each 4-comers 
subsample): a linear additive model, a model nonlinear in X„ a model nonlinear in X 2 , and a 
nonadditive model. Tests for the presence of nonlinearity and nonadditivity were conducted by 
testing the statistical significance of the nonlinear or nonadditive term. 

The Monte Carlo study was conducted using SAS, Versions 6.06 and 6.08. The 
components of the program were verified by comparing the results with the standard SAS output 
for benchmark data sets. 

Results and Discussion 

In total, 2304 conditions were examined in the Monte Carlo study (i.e., three models, 
four levels of population R 2 , four levels of regressor correlation, four levels of regressor 
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reliability, three levels of sample size, and four levels of effect size for the nonlinear or 
nonadditive component). To conserve space, and because the results were substantively 
consistent across levels of these design factors, only summary results will be presented here. 
Complete results, however, are available from the authors. 

The results will be presented in terms of the proportion of samples in which the correct 
population model was identified (i.e., a linear-additive model, a nonadditive model, or a model 
nonlinear in X,), and the proportion of samples in which an incorrect model was identified. Such 
proportions are related to, respectively, the sensitivity and the specificity of these analysis 
strategies. Because the tests for nonlinearity and nonadditivity were conducted independently of 
each other, an individual sample may lead to a rejection of more than one of the null hypotheses. 
Thus, a single sample may suggest either nonlinearity or nonadditivity. 

The results for the linear- additive models are presented in Table 1. This table presents, 
each nominal alpha level, the proportion of samples that were identified as evidencing 
nonadditivity or nonlinearity in either X, or X 2 . Any of these model identifications represent 
Type I errors in the rejection of the null hypothesis of no change in the model R 2 relative to a 
linear-additive model. As is evident in this table, the Type I error rate was well controlled 
whether the complete sample was analyzed or whether the 4-comers of the data were used. In 
each condition, and for each nominal alpha level, the proportion of samples that led to a rejection 
of the null hypothesis was very close to the nominal level of alpha. In addition to verifying the 
Type I error control of these methods of analysis, these results provide a check on the integrity of 
the computer code written for the Monte Carlo study. 
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Insert Table 1 about here 



The results for the nonadditive population models are presented in Table 2. The 
proportions presented in this table represent either statistical power (i.e., the proportion of 
samples in which the nonadditive model was identified based upon the sample data), or 
misidentification rates (i.e., the proportion of samples in which each of the nonlinearity null 
hypotheses was rejected). For example, the first row of Table 2 reports the overall results with a 
population R 2 of .02 when data were generated from a nonadditive population model. For these 
samples, with a nominal alpha level of .10, the moderated regression model was identified in 
72.9% of the samples when all of the sample data were included in the regression (i.e., a power 
estimate of .729). However, in 40.3% of these samples, a regression model that was nonlinear in 
X, also fit the data statistically significantly better than the linear-additive model, and in 40.6% 
of the samples a model that was nonlinear in X 2 also fit better than the linear-additive model. 
Thus, researchers testing hypotheses about these models would misidentify the population model 
at rates of greater than .40. In contrast, when only the 4-comers of the samples are used for the 
regression analyses, the correct model was identified in 70.1% of the samples, providing slightly 
less power than was obtained with the use of the full samples. However, the estimated rate of 
misidentifying the model as nonlinear were also lower than those obtained with the full samples 
(giving estimates of .360 for nonlinear X 1 and .361 for nonlinear X 2 . Both the statistical power 
and Type I error rate estimates remained relatively stable across levels of R 2 in the population. In 
each case, the use of the full samples provided greater statistical power, but such power 
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advantages were accompanied by greater probabilities of misidentifying the population as 
nonlinear. 



Insert Table 2 about here 



A similar result was obtained for the level of correlation between the regressors. Across 
all levels, the use of the full model provided greater statistical power but higher misidentification 
rates. In contrast the effects of population R 2 , the level of correlation between the regressors 
affected both the power and the Type I error rates of these tests. Specifically, as the correlation 
between the regressors increased, the statistical power increased for both full sample and four- 
comers strategies. However, a concomitant decrease in the specificity was also evident for both 
strategies. For example, at a nominal alpha level of .05, when the regressors were uncorrelated, 
the power of the full sample analysis was .614 while that of the 4-comers analysis was .591. 

Both analysis strategies also evidenced low levels of model misidentification, with the nonlinear 
X, model identified only 8.5% of the time with the full samples and only 7.7% of the time with 
the 4-comers. In contrast, when the level of correlation between the regressors was .80, the 
power to detect the moderating model increased to .733 for the full samples and to .680 for the 4- 
comers. However, the misidentification rate increased to 64.5% for the full samples and to 
56.9% for the four comers. 

The effect of regressor reliability was similar to that of regressor intercorrelation 
although the effect was smaller in magnitude. That is, increasing regressor reliability led to 
increasing the power in the test for the moderating model, but also led to increasing 
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misidentification rates. For example, at a nomina alpha level of .05, when the regressor 
reliability was .40, the power of the full sample analysis was .452 while that the 4-comers 
analysis was .413. Both analysis strategies also evidenced relatively low levels of model 
misidentification, with the nonlinear X, model identified only 20.0% of the time with the full 
samples and only 14.8% of the time with the 4-comers. In contrast, when the reliability of the 
regressors was 1 .00, the power to detect the moderating model increased to .814 for the full 
samples and to .789 for the 4-comers. However, the misidentification rate increased to 44.6% for 
the full samples and to 41.3% for the four-comers. 

Finally, as should be expected, increasing sample sizes or increasing the population 
effect sizes led to increases in power of the test for the moderated regression model, but 
concommittant increases in the rates of misidentifying the model. Such effects were evident for 
both the full samples analyses and the 4-comers analyses. In all conditions, however, the power 
of the full sample analyses was greater than that of the four-comers. 

The results for the nonlinear X, population models are presented in Table 3. As with the 
results presented in Table 2, the proportions presented in this table represent either statistical 
power (i.e., the proportion of times in which the nonlinear X, model was identified based upon 
the sample data), or misidentification rates (i.e., the proporation of samples in which the null 
hyupothesis of nonadditivity or the null hypothesis of nonlinearity in X 2 were rejected). 



Insert Table 3 about here 
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The results for the nonlinear population models were nearly identical to those for the 
nonadditive models. That is, the use of the full samples was assocaited with greater statistical 
power but also with higher rates of misidentification. For both the full sample and the 4-comers 
strategies, the power increased with increasing correlation between regressors, with increasing 
reliability of regressors, with increasing sample size and with increasing effect size. However, 
while these factors lead to greater statistical power, they also led to higher rates of 
misidentification. 

The results suggest that the use of the 4-comers strategy rather than the full sample 
analysis has both benefits and costs. Specifically, the 4-comers strategy evidenced better 
specificity (i.e., lower misidentification rates), but at the expense of reduced statistical power , or 
sensitivity, relative to the full sample analysis. Despite the improved specificity of the 4-comers 
approach, the model misidentifacation rates were distressingly high in many of the conditions 
examined. With increasing regressor intercorrelations, increasing reliablility of regressors, 
increasing sample size and increasing effect size, the probabilities of rejecting null hypotheses 
associated with the incorrect functional form of the model increased along with the statistical 
power of the test for the correct functional form. Thus, the utility of either the 4-comers 
approach or the full sample approach for testing theory is limited. 

For example, in applied research, if a particular theory suggests that a nonadditive model 
should be present in a population, researchers may collect a sample of data from that population 
and test the null huypothesis of nonadditivity in the sample. If the theory is correct, the use of the 
full sample will lead to a greater chance of identifying the nonadditivity (i.e., researchers will 
have more statistical power by using the full sample rather than the four-comers). However, if 
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the theory is wrong and the population actually evidences nonlinearity rather than nonadditivity, 
then the use of the full sample will lead to a greater chance of misidentifying the model as 
nonadditive. That is the full sample analysis is more likely to provide support for a theory which 
is wrong, while the subsample approach is less likely to support such a theory. However, in 
many conditions neither approach should provide prudent researchers with much confidence that 
the correct model has been identified. 
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